An empirical data analysis of “price runs” in daily financial indices: Dynamically assessing market geometric distributional behavior

In financial time series there are time periods in which market indices values or assets prices increase or decrease monotonically. We call those events “price runs”, “elementary uninterrupted trends” or just “uninterrupted trends”. In this paper we study the distribution of the duration of uninterrupted trends for the daily indices DJIA, NASDAQ, IPC and Nikkei 225 during the period of time from 10/30/1978 to 08/07/2020 and we compare the simple geometric statistical model with p=12 consistent with the EMH to the empirical data. By a fitting procedure, it is found that the geometric distribution with parameter p=12 provides a good model for uninterrupted trends of short and medium duration for the more mature markets; however, longest duration events still need to be statistically characterized. Estimated values of the parameter p were also obtained and confirmed by calculating the mean value of p fluctuations from empirical data. Additionally, the observed trend duration distributions for the different studied markets are compared over time by means of the Anderson-Darling (AD) test, to the expected geometric distribution with parameter p=12 and to a geometric distribution with a free parameter p, making possible to assess and compare different market geometric behavior for different dates as well as to measure the fraction of time runs duration from studied markets are consistent with the geometric distribution with p=12 and in parametric free way.


Introduction
Financial-market analysis studies the movements of price assets and financial indices. Extracting a profit from these movements is an important activity in the financial industry; a large variety of methods that intend to predict market behavior have been developed over the years, ranging from complex mathematical models to even pseudo-scientific techniques [1]. An important approach is the statistical analysis of large sets of data, now partially available to small investors, as well, due to the increasing availability of computer power and high quality data sets. This analysis has benefited from the contributions not only from economists, but also from many physicists and mathematicians who have applied methods and ideas of probability theory and statistical physics to finance. As an academic result of these efforts, a set of universal, nontrivial statistical properties of financial historical data, persistent over time, has been observed and called "stylized facts" [2,3]. When looking at price values of an asset on a financial time series chart, it is common to observe "price trends" in which most of the values are larger (or smaller) than the previous ones, these trends can be seen as composed by uninterrupted elementary trends, with periods in which the value increases or decreases monotonically. Trends are a popular subject within the so-called technical analysis. According to the followers of technical analysis, the chartists, patterns in the trend direction of financial data are believed to be indicators of changes in market direction and indicative of future behavior of prices. The effectiveness of this approach to financial markets is disputed and put at a stake by what is known as the Efficient Market Hypothesis (EMH), which indicates that current prices reflect available information. Elementary uninterrupted trends are the main subject of the present work, where we study empirically a basic random process consistent with the EMH allowing us to quantify trend directions in financial time series. From now on, we call these elementary uninterrupted trends only "trends", or more specifically, "uptrends" or "downtrends", depending of their direction. Empirical studies of financial and economic data are becoming increasingly relevant for the following reasons: 1) Currently dozens of stylized facts have been observed and more are still being discovered. 2) The study and prediction of stylized facts by means of methodologies of multi-agents market models is an important area of research in Finance and Econophysics. 3) Stylized facts are an import tool to validate proposed numerical and multi-agent market models; and 4) At present, we still lack a general, microscopic theory or model to explain the origin of stylized facts, we think simulation methodologies using agents could be useful in the construction of such a general theory. Some interesting references on these issues are the following: [3][4][5][6][7][8][9][10].
Before going further, it is necessary to present some preliminary and basic definitions. In subsections Definitions and The Efficient Market Hypothesis these definitions and other useful information will be presented. In section An 'Efficient Market' toy model for the distribution of run durations, a model for the distribution of trends duration will be developed consistently with the EMH. Section Data sample and methodology will explain how the data were analyzed and section Data analysis will provide an interpretation of the analysis.

Definitions
Given a financial time series of asset prices or index values, S(1), S (2), . . ., S(n), let X(t) = log S (t) be the logarithm of its terms, where t = 1, 2, . . ., n. A common quantity used to study price variations in financial time series is the log-return defined at time t as rðt; DtÞ � Xðt þ DtÞ À XðtÞ ð1Þ for a given time sampling scale Δt. If the price variation is small, the log-return is a good approximation of the return Rðt; DtÞ ¼ Sðt þ DtÞ À SðtÞ SðtÞ : ð2Þ In this paper, we consider Δt equal to 1 day and we use the values of the indices corresponding to the close value in the investigated markets. More details on the data set will be given in section Data sample and methodology.
An elementary trend of duration k is defined as a subseries of k + 1 values within the given time series S(t) in which every value is greater (for an uptrend) or smaller or equal (for a downtrend) than the preceding one, an example of which is shown in Fig 1 for the prices of the DJIA, in a time period between October 1978 and January 1979. The duration of an elementary downward/upward trend in daily data is the number of days before the price changes direction, as the price varies, i.e. if the price does not change sign from one day to the other, the corresponding trend continues. In this figure and focusing our attention on red points, we see first an uninterrupted downtrend one day long, followed by a three days long uptrend, then a downtrend with a duration of two days, a three days long uptrend, etc. By construction, uptrends and downtrends appear alternately in the original time series S(t).
Here, we present a detailed statistical study of these short elementary trends using market closing price values from four different indices over a time sampling scale of Δt = 1 day for the period between October 30, 1978 and August 07, 2020.

The Efficient Market Hypothesis
The Efficient Market Hypothesis (EMH), first stated by Eugene F. Fama in 1970 [11], claims that the market quickly finds the rational price for a traded asset [12], as the current value incorporates all possible information about the price in the future. The most important consequence of this hypothesis was shown by P. Samuelson [13] and it is the fact that the best forecast for the future price of an asset is its present price.
where Eð�jF t Þ is the conditional expectation with respect to the filtration F t , namely with respect to the known history up to time t. Indeed, it is easy to derive the above form of EMH starting from a simple statistical no-arbitrage argument. Suppose we have two assets, a risky one, with price S(t) and a risk-free one giving a constant interest rate r F . To avoid arbitrage, one has to require that the expected return of the risky asset is equal to the risk-free interest rate, that is where R(t, Δt) was defined in Eq (2), assuming for simplicity that no dividends are paid in the time interval Δt. The latter equation immediately yields, for non vanishing S(t), which reduces to Eq (3) for r F = 0. Eqs (3) and (5), jointly with the integrability of the process S (t), are known as martingale and sub-martingale conditions (remember that, under normal conditions r F � 0, even if interest rates can be negative), respectively. From a technical point of view, one has to further assume integrability of the price process (E½jSðtÞj� < 1), together with Eqs (3) or (5), and Eq (5) together with integrability means that the discounted price is a martingale when r F > 0. Please notice that Eqs (3) and (5) are not uniquely specifying a random process for S(t), but one can prove that, if they hold, then returns must be uncorrelated.
In financial data, square returns or absolute returns turn out to be correlated (with long-range correlations), but this stylized fact does not falsify the EMH even if it is the main reason for the popularity of ARCH/GARCH models in financial econometrics [14,15]. The EMH invalidates the pretence of technical analysis to predict future prices or trends; in fact, in Samuelson's words, "there is no way of making an expected profit by extrapolating past changes in the futures price, by chart or any esoteric devices of magic or mathematics" [13].

An 'Efficient Market' toy model for the distribution of run durations
Among all the possible statistical models that can describe price fluctuations, the geometric random walk is the simplest one. A geometric random walk is just a product of independent and identically distributed positive random variables. If the expected value of these variables is 1, then the geometric random walk is a martingale; otherwise, if the expected value is larger than 1, the geometric random walk is a submartingale. However, the geometric random walk hypothesis is neither necessary nor sufficient for an efficient market, as shown by many authors, among whom Leroy [16], Lucas [17] and Lo and Mackinlay [1]. Again, to see this point, it is enough to consider that Eq (5) allows for any (sub)-martingale model.
To study our trends, at each step of a time series of price or index values, there are three possible outcomes: increase, constant and decrease, but the second one does not change prices direction. Then we consider the two possible outcomes: either the time series increases or it does not increase. In an efficient market, the expected future price only depends on information about the current price, not on its previous history. Therefore, it should be impossible to predict the expected direction of a future price change given the history of the price process. In formula, from Eq (3) (after discounting for the risk-free rate), we have EðSðt þ DtÞ À SðtÞjF t Þ ¼ 0; ð6Þ therefore, if we consider the sign of the price change Y(t, Δt) = sign(S(t + Δt) − S(t)), which coincides with the sign of returns, we accordingly have If the price follows a geometric random walk, then the series of price-change signs can be modeled as a Bernoulli process. This process could be biased to take the presence of a risk-free interest rate into account. To be more specific, let us consider a log-normal geometric random walk and let us use the assumption Δt = 1. Let S 0 be the initial price. The price at time t will be given by where Q i are independent and identically distributed random variables following a log-normal distribution with parameters μ and σ. These two parameters come from the corresponding normal distribution for log-returns. As a direct consequence of the EMH in the form Eq (5), we have whilst for a log-normal distributed random variable the expected value is by combining these two equations the following dependence between the parameters is found: s ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ðlogð1 þ r F Þ À mÞ p ð11Þ which allows us to compare the parameters estimated from the distributions of the index price returns, as the values of the two parameters μ and σ come from the corresponding normal distribution for log-returns of the price or index data. Further, from the cumulative distribution function of a log-normal random variable we find that the probability of a negative sign of the return is given by For typical markets, from 1978 to 2020, the value of the daily risk-free rate of returns oscillated in the range 0 < r F � 2.5 × 10 −4 . Eq (11) can be tested using the values of r F and the estimates of μ and σ) as can be seen in Fig 2. Using these values for r F , the probability of negative returns is found to be q = 0.5±0.02, thus the Bernoulli process seems a reasonable first approximation for the probability of a change in sign for the return data.
Under this framework, it becomes natural to use the biased Bernoulli process as the null hypothesis for the time series of sign changes of the log-returns [18]. It is known that the distribution of the number x of failures needed to get one success for a Bernoulli process with success probability p = 1 − q is the geometric distribution GðpÞ. The number of failures is then given by The duration of an elementary downward trend in daily data is the number of days before the price increases, so the distribution of such trend duration should follow a geometric distribution. An identical argument applies to the duration of an upward trend. Note that such sequences of identical outcomes are also known as runs or clumps in the mathematical literature. Some historical references on this subject, are [19][20][21]. Where chapter X of the first reference was during many years the classical textbook reference to Theory of Runs; second reference shows an interesting statistical test based on runs properties to demonstrate that two sets of independent observations corresponding to two independent random variables have the same distribution and finally, the third reference presents an intensive treatment of the theory of runs still of current interest.
In the next section we describe the data used for testing the model just presented, as well as a discussion of the goodness of fit test applied to compare the observed and expected distributions, namely the geometric distribution, of the duration of the trends of upwards/downwards price, which coincide with the sign of the log-returns.

Data sample and methodology
In this work, daily close data values of four financial indices were analyzed, namely Dow Jones Industrial Average (DJIA), NASDAQ Composite, the Mexican Índice de Precios y Cotizaciones (IPC) and Nikkei 225, during the period between October 30 1978-August 07 2020. All data sample for the mentioned time span is available as suplementary material, see S1 Dataset at the end of this paper. Number of analyzed records and found uninterrupted uptrends and downtrends, as defined in subsection Definitions are displayed in Table 1. For these values the probability of a negative return is q = 0.5±0.02. Subfig 2(a) is the probability of negative return given the risk-free interest rate r f and mean σ. The σ parameter can be estimated from the return time series Q i , and r f is estimated from a reference asset that depends on the market being studied. Intersection with q = 0.5 is also shown. Subfig 2(b) is the daily risk-free rate of returns r F for the US market, developed markets and emerging markets. For the period ranging from 1978 to 2020 the daily risk-free rate oscillates in the interval [0, 0.061). Data was downloaded from http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. https://doi.org/10.1371/journal.pone.0270492.g002 Remember that by construction, for each data sample, the number of uninterrupted uptrends and downtrends are the same if the analyzed financial time series has an even number of total trends and they differ in one unity if the total number of trends is odd respectively.
The composition of trends for each data sample is described in Tables 2-5. Additional and brief comments on the different duration of constructed uninterrupted trend data samples may be found in section Conclusions.
Finally, in Table 6, we show the descriptive statistics of data presented in the current section. Values of first four central moments are displayed. It can be seen that the mean value of the observable trends duration for all studied markets is close to two, it is bigger for less mature markets and that uptrends mean duration is slightly bigger that downtrend mean duration for all markets.

The Anderson-Darling goodness of fit test
In order to compare the observed and expected distributions of trend durations, the Anderson-Darling (AD) test described in references [22,23] was used. The AD test belongs to a family of goodness of fit tests called the Cramér-von Mises tests, which includes the Anderson-Darling test, Watson's test and the Cramér-von Mises test itself. The family was originally developed to test continuous distributions, but a generalization for discrete distributions appeared for the first time in an article by Choulakian et.al. [23]. The Anderson-Darling test was found to be the most suitable for this purpose because it places more weight on the tails of a distribution than other goodness of fit tests.  The principle behind this kind of tests is defining a statistic that serves to measure the distance between a theoretical distribution function F 0 (k) and the empirical (cumulative) distribution function for n events, F n (k). Every value of the statistic is associated with a p-value, that can be interpreted as the probability of obtaining a value of the statistic at least as large as the one obtained, given that the null hypothesis.
is true. If the p-value is smaller than a previously defined threshold value α, the null hypothesis is rejected. Two separate tests were applied on our data: 1. A test of whether the observed data comes from a geometric distribution with p = q = 0.5. Based on the model outlined in section An 'Efficient Market' toy model for the distribution   of run durations and empirical evidence on data, we interpret a rejection of this null hypothesis as evidence that the market is moving up or down in the investigated period.

2.
A test of whether the observed data is drawn from a distribution belonging to a parametric family GðpÞ. This tell us whether the up and down ticks can be modeled as a Bernoulli process.
These two tests will allow us to assess the validity of the Bernoulli hypothesis.
For a more complete discussion on the Anderson-Darling test for discrete data, including some comments about how it was applied to the geometric distribution case, see S1 Appendix.

Data analysis
In order to motivate our analysis, in Fig 3, for the four markets studied in this paper, we show the ratio of the upward to total price changes in daily data plotted against time for the years 1978-2020. This ratio is calculated over an overlapping time window of 504 trading days shifted every 5 days. It can be seen that variations of this ratio fluctuate closer to the value of 1 2 for DJIA and Nikkei, whereas for Nasdaq and IPC they are greater than those expected for the same time windows in a Bernoulli process with parameter p = 1/2.
Hereunder, we present different detailed studies on the four markets to see whether consecutive price increments/decrements with the same sign do follow a geometric distribution. Firstly, after estimating the distributions of the duration of uninterrupted uptrends and downtrends for all data, we separately and independently fit a geometric distribution to each one of these distributions, where the observed sum of the same duration uptrends and downtrends is the only constraint. For this reason, although we denote the estimated parameter p and q for uptrend and downtrend durations respectively, this is only nominal, since we fit those parameters separately and independently without constricting the geometric fits to comply the constraint p + q = 1, i.e. we consider and analyze the sequences of uninterrupted uptrends and downtrends durations separately. Due to this reason, we are sometimes prone in this paper to refer only to the parameter p in our discussion. More on this point can be found at the end of the current section.
Analyzed empirical data can be consulted in Tables 2-5, and their corresponding geometric fits are displayed in Fig 4 for all probability distributions of uptrend and downtrend durations corresponding to the four different indices studied here. A Maximum Likelihood Fit (MLF) was applied. The results of these fits can be consulted in Table 7. In Fig 4, black solid small circles represent observations, the geometric fit corresponds to the red solid line and, as a visual guide, blue dashed lines indicate a geometric distribution with parameter p = q = 0.5.
In order to obtain a good fit, with appropriate and correct p and χ 2 values, the fitting procedure was applied on the plots region where no null event gaps were observed in the trends duration distributions, i.e. the region where trend duration showed zero events for first time were excluded from the fit. Cut off applied are also shown in all corresponding plots of  Table 7, it can be seen that, although all discussed markets display some extreme trends durations deviating in different grades from the geometric model, for the whole of our data sample, distributions of increasing and decreasing trends durations for DJIA and Nikkei can be fitted reasonably by a geometric distribution with p ¼ 1 2 , while the corresponding empirical runs distributions of Nasdaq and IPC are also reasonably fitted by a geometric distribution with parameters not necessarily equal to 1 2 . More on this facts will be discussed below and in next sections. From above fits, we can rank markets in order of decreasing distance from the p = 0.5 model, with Nikkei being the closest, followed by the DJIA then the Nasdaq, and the IPC being the most distant during the analyzed time period.
Finally, although for small and medium size trends, especially for the more mature markets DJIA and Nikkei, the geometric model with p ¼ 1 2 is a good approximation, it is not possible to conclude that the conditions p ¼ q ¼ 1 2 and p + q = 1 are fulfilled in general for all market and every run duration. Classical analyses [24] reported periods where for runs of the monthly index DJIA, p = 0.57 and q = 0.43 (1897-1929) and in contrast the S&P monthly composite index is not even consistent with the equal probability condition p ¼ q ¼ 1 2 and even with the probability conservation condition p + q = 1, for example for the period January 1871 to December 1917, for this index p = 0.67 and q = 0.50 and for the time span January 1918 to March 1956, p = 0.6 and q = 0.60. These results suggest that for these cases and the financial indices examined in [24], we are not dealing with a random process with p ¼ q ¼ 1 2 . In the classical reference [24], p and q values are not estimated by a fit procedure as the performed here: instead they calculate the relative frequencies of indices up and down events to different time scales.
More deviations observed empirically of the hypothesis p ¼ q ¼ 1 2 are reported in [25][26][27]. For an interesting and more modern analysis on runs for high frequency financial data, see [28].

Time variation of p and q and other estimates of these parameters
In order to gain a better insight about how our empirical trend duration distributions dynamically differ from the geometric theoretical distribution, the evolution in time of p and q values are plotted in the upper and lower left panels of Fig 5 respectively. These parameters are

PLOS ONE
An empirical data analysis of "price runs" in daily financial indices independently calculated over a time window of 252 trading days shifted every ten days, separately over the sequences of observed uninterrupted uptrends and downtrends durations. We see that their corresponding values tend to oscillate around 1 2 and that markets with p and q values closer to 1 2 are DJIA and Nikkei. Empirical distributions of the calculated values of p and q for all markets are displayed in the right, upper and lower panels of same Fig 5 respectively. The corresponding mean and standard deviation of p and q values are displayed in Table 8,   Table 7. Fitted p and q parameters of the geometric model. DJIA and Nikkei empirical trend durations distribution are well fitted by the geometric model. Nasdaq and IPC are not. NDF means "number of degrees of freedom". Fits were performed on the data listed in Tables 2-5.

Market
Fitted   Table 7.
Here it is important to mention the estimate of p and q shown in Table 8, at this moment only serves to corroborate the values obtained by the geometric fitting procedure. We mention this, because notwithstanding the independence of both measurements and the agreement between values displayed in both Tables 7 and 8, entries of distributions shown in the two right histograms in Fig 5 are not all really statistically independent, since they were calculated by using a rolling time window of 252 days as described above. In addition to this, p and q values show a certain degree of non-stationarity and finally, these results can be dependent on the choice of the time-window size. Even considering these three facts, the agreement between the estimates obtained by these two different procedures over different time frames is remarkable. Taking in consideration these facts and in order to confirm the quality of our estimation, we show in Table 9 the results obtained, this time using no overlapping time windows of again 200, 252 and 300 days. Fig 5, and generated with a rolling,  To end this subsection, we observe that the distance from the p = 0.5 model for the different studied markets established at the end of section Data analysis by the geometric fitting procedure is again confirmed by the values of p and q showed in Table 9.

Mean value and variance of p + q distribution
Even if we calculate p and q independently and we use this notation in nominal way, in this subsection and for completeness reasons, we carefully study the probability conservation that a geometric stochastic process must to meet, i.e. p + q = 1; in order to see what is happening, we show in Fig 6(a), the behavior of p + q as a function of time, calculated as explained before, by using a 252 trading days rolling time frame shifted each 10 days. It can be seen that p + q for DJIA and Nikkei oscillates around 1, whereas IPC and Nasdaq get closer on time to this value and, then, after year 2000, they follow the same behavior than DJIA and Nikkei. The upper right panel of same figure shows p + q empirical distribution for all studied markets. Non stationarity effects are observed in all of them in different degree, however mean value of p + q are close to 1 in all those distributions. Cutting off all data previous to year 2000 and repeating this analysis, it is observed that indeed p + q for all markets fluctuate closer and around the value of 1, and that even runs of IPC and Nasdaq markets turn closer to geometric as time passes. Distributions of these fluctuations are plotted in Fig 6(d). Corresponding mean and standard deviation of p + q distributions for all analyzed and restricted after year 1999 data, can be seen in Table 10, where also we have calculated these mean values for 200 and 300 trading days rolling time windows.
In the same way we proceeded in previous subsection Time variation of p and q and other estimates of these parameters, we calculate mean values and RMS of p + q for no overlapping time frames of 200, 252 and 300 days. Obtained values are shown in below Table 11, also calculated for all time period of the recorded data sample and for the span of time after year 2000.
For the full period of all analyzed data and from the measurements shown in Tables 10 and  11, we can rank studied markets in the following order of closeness to the geometric distribution: 1) DJIA, 2) Nikkei, 3) Nasdaq and lastly 4) IPC. Estimation of <p> and σ p . We have estimated p by applying the usual, one parameter fitting procedure illustrated in section Data analysis. We have seen that the estimate value of p for the different data samples, are compatible with the corresponding values of <p> obtained by averaging data for each movable overlapping and not overlapping time windows with sizes given in Tables 8 and 9. The above and the following is, of course, also valid for the case of q.
The process of finding the maximum likelihood estimate of the parameter p in a geometric distribution as given by Eq (14): is a well known methodology which consists of finding the valuep, of p, which maximizes the likelihood function. For the case of the geometric distribution, given a random sample x 1 , . . ., x n , we obtain:p where � x denotes the sample mean. From the asymptotic properties of the MLE estimators, see [29,30],p has approximately a normal distribution with mean <p> and variance n −1 p 2 (1 − p). Formulas to calculating the error of the mean and variance are well known, see [31], although their estimation is usually automatically made in the background by the scientific software used to perform the data analysis, in our case Mathematica.
To conclude this subsection, we must point out that the measurements show that p + q is slightly greater than 1 for DJIA and Nikkei, are not contradictory with empirical experience, since by studying runs for different time scales, a slight excess of uptrends over downtrends has been observed in financial data at least since the 1930s [24][25][26]. Remember also that our two measurements of p and q were performed in an independent way and that the early financial literature also evidences that, at least for some time spans, the evolution of runs is not well represented by a random walk with equal probabilities of going up and down. We believe these empirical facts are well known in financial econometrics, but may not be well-known by physicists.
In this paper, we do not only confirm these experimental facts and show time evolution of p, q and p + q, but in next section, we will estimate the fraction of time runs of markets follow a geometric behavior with p ¼ 1 2 and with any p.

Anderson-Darling test in the case p = 0.5
To study dynamically how the theoretical statistical model differs from the empirical data, we calculate the Anderson-Darling statistics for the corresponding trends durations of the observed empirical distribution and the theoretical, geometric distribution with parameter p = 0.5. Fig 7 display the obtained p-values of the Anderson-Darling statistic, A 2 n for different time periods (not to be confused with the p parameter of the geometric distribution). Remember that in the case we are interested, a p-value is the probability of obtaining a value of A 2 n at least as big as the one that was really obtained, given that the probability distribution is actually geometric.
Analysis presented in Fig 7(a) shows that for the DJIA, the greatest deviations from the geometric distribution with parameter p ¼ 1 2 , occurred between the years 2002-2011. Fig 7(b) for Nasdaq, it is observed that as time goes by, p-values of the Anderson-Darling show that empirical data tends to agree better with the geometric distribution Gð0:5Þ, especially after year 2000. Fig 7(c) shows that, similar to NASDAQ, the IPC index agreement between data and the geometric distribution increases with time. Finally, Fig 7(d) shows that for the Nikkei case, pvalues of the Anderson-Darling show a good agreement between the Geometric model with p ¼ 1 2 and the observed trend duration distribution. As an auxiliary analysis, Fig 8 shows the dates when the events from Fig 7 have a p-value below the α = 0.05 significance level, or in other words, the dates for which the null hypothesis can be rejected, with a significance level α = 0.05 and the complementary dates for which the geometric hypothesis with p ¼ q ¼ 1 2 cannot be rejected. The above observations are compatible with the plot shown in Fig 5 presented in subsection Time variation of p and q and other estimates of these parameters, where it is shown that the greatest, however diminishing deviations from the geometric distribution with p = 0.5 occurred between the years 1980-2000, especially for Nasdaq and IPC and to a lesser extent for DJIA and Nikkei. In next subsection Anderson-Darling parametric test for the geometric distribution, we shall show that, in all cases, still the geometric model upholds, by allowing the parameter p to vary freely.

Anderson-Darling parametric test for the geometric distribution
Let us explore the possibility that trend durations follow a geometric distribution with any parameter p 2 (0, 1). Results of this parametric test are displayed in Fig 9. It can be observed that, for all studied markets, the assumption of a Bernoulli process for price directions holds reasonably well for most of the time, except for sporadic deviations that are usually related to extreme market movements such as in the case of a financial crisis. Again Fig 10 is an auxiliary figure that shows the dates when events from Fig 9 have a p-value below the α = 0.05 significance level, i.e. the dates when studied markets do not follow the geometric model at the mentioned significance level. As it may be seen, for an important fraction of time, markets do seem to follow the geometrical model with some parameter p.
An application of previous results will be discussed in the next section A simple application: Assessing the fraction of time markets runs follow a geometric distribution.

A simple application: Assessing the fraction of time markets runs follow a geometric distribution
Continuing the discussion at the end of section Data analysis, we can also use the results presented there to assess the percentage of time that the market follows the geometric model with p ¼ 1 2 and in a p free parametric way. In order to do this we propose the following methodology: 1. Calculate a time series of p-values from the sample of trends durations using the geometric process with p = 0.5 as the null hypothesis.
2. Count the number of points above the significance level value α = 0.05.
3. Divide it by the length of the time series to obtain the percentage of time the market has behaved as a market following the geometric process.
Repeat it for the case non parametric, i.e. the same null hypothesis but now with any p.
Following above criterion, we rank studied markets as follows: closest for a bigger time fraction to the geometric model with p ¼ 1 2 was the DJIA, followed by Nikkei 225, then the NAS-DAQ Composite and the end, the IPC. Here, under this criterion more mature markets runs follow closer the geometric model with p ¼ 1 2 for a longer, but this time Nikkei 225 and DJIA exchange rank position. Results obtained by means of this methodology, for the geometric case with p ¼ 1 2 as well as for a parametric free way, may be consulted in Table 12. more statistics is needed to study these extreme events that do not seem to follow the geometric model. Additionally, by selecting overlapping and non overlapping time frames of 200, 252 and 300 trading days, we display p and q behavior over time, and the distribution of these parameters, allowing us to estimate their mean and RMS values and compare the former with the corresponding values obtained by a fitting procedure. Agreement obtained is remarkably good. We have shown that for all markets p and q values are evolving towards the value of 1 2 . The p + q evolution over time is also displayed and by using same methodology we observe that <p + q> is approaching over time to the value of one for all markets. Finally, markets with uninterrupted trends durations closer to follow a geometric behavior with p ¼ 1 2 may be ranked in the following order: Nikkei, DJIA, Nasdaq and IPC, meaning more mature market are closer to the geometric behavior with p ¼ 1 2 . Anderson-Darling test has been used to quantify the likelihood that a series of trends durations were generated by a process compatible with the geometric model with p ¼ 1 2 ; we also employed it to assess for how long, trends durations follow the geometric distribution with p ¼ 1 2 , as well as for any other value of the parameter p. Corresponding dates during which markets runs do not follow the geometric behavior for p ¼ 1 2 , and in a parametric free way, are displayed respectively in auxiliary Figs 8 and 10. Numerical time fractions displayed in Table 12 correspond to the fraction of the time markets follow the geometric distribution with p ¼ 1 2 and in parametric free way. First column of this table shows that for the significance level of 5%, price runs distribution of DJIA follows a geometric distribution with p ¼ 1 2 the 84% of the time, Nikkei 81%, Nasdaq 47% and IPC 37% of the time. Ranking obtained by this criterion, although exchange DJIA and Nikkei 225 positions, once again classifies more mature markets at the top of the list. Second column of same Table 12 shows that the distributions of all studied markets trends duration are close to a geometric distribution with a parameter p Fig 10. Colored points, show dates from the parametric test where events observed in Fig 9 have a p-value below the α = 0.05 and then geometric model for any p can not be applied to describe runs size distribution for the different analyzed markets at that significance level. Again, for easy reading colored points are enlarged.
https://doi.org/10.1371/journal.pone.0270492.g010 Table 12. Fraction of time, the overall of the studied data trends durations follow a geometric distribution with parameter p = 0.5, and with any p, both cases for a significance level of 5%. not necessarily equal to 1 2 a high fraction of all time. This fact can be supported by the quality of fits and respective fit parameters non equal to 1 2 displayed in Fig 4 and Table 7, showing that with the exception of the few extreme values, geometric model fits well Nasdaq and IPC runs duration, with parameter p (and q) non necessarily close to 1 2 . Obtained results also show us that for more mature markets runs distribution are closer for a longer time spans to the geometric distribution with p ¼ 1 2 , and that less mature markets runs seems to evolve on time to this same distribution. This empirical result reminds us the fact that worldwide markets increase their efficiency with time [35][36][37][38].

Market/Case
In section An 'Efficient Market' toy model for the distribution of run durations, we state that the geometric model with p ¼ 1 2 applied to price runs it may consistent with the EMH. However, if the empirical analysis falsifies the process, this does not mean that the EMH is falsified. In our opinion, more and deeper study should be necessary to clarify these facts, given that market efficiency refers to returns and not to price runs.
Finally, besides the above mentioned problem of making explicit the relation between market efficiency and the geometric behaviour of price runs, we have some additional remarks possibly leading to future work: in this paper, we analyzed regularly sampled data i.e, daily close price data, and although the geometric model seems well suited to model short and medium price trends durations, this observable is really a continuous random variable and conceivably the geometric model might not be suitable to describe non regularly sampled data [33], as for example in tick-by-tick data. The second remark has to do with the extreme values observed in the different trends durations distributions that occur with a higher probability than expected from the geometric model, as can be observed for values of trend durations above cut-off values signaled in the different panels of Fig 4 and recorded in the second and fifth columns of Table 7. In the present analysis, we observe at most two of these extreme events in the different panels of Fig 4, which is insufficient for saying something interesting on the distribution of these outliers. Also it will be interesting to study the relation of these runs extreme events with extreme returns events, particularly financial crashes; for example by using smaller time windows in our analyses. Third, although by their construction, in any data sample the number of downtrends and uptrends must be the same or their difference at most of one unit; from the composition of trends shown in Tables 2-5, it can be seem that for very short duration trends, number of downtrends predominate and for medium and long duration trends there are more uptrends than downtrends, this asymmetry and its relation with corresponding returns deserves a more detailed study.
In our opinion, and even if data analyses such as the one presented here have a long history, we have managed to find new results of possible interest to the econophysicist and financial communities. Specifically, apart from independent and consistently estimating the parameters p and q, by two different methods, we show their time evolution as well as the time evolution of their addition p + q. Moreover, we not only show that the runs distribution of the markets studied is compatible with the geometric distribution with the estimated parameters, but we also estimate when and the fraction of the time during which the markets follow this behavior, parametrically for p = q = 0.5 and non-parametrically. The detection of when and for how long the distribution of the durations of market price runs have a geometric distribution is in our opinion our most important result and achievement of those presented here, from both, the academic and practical points of view; it is really not obvious that the duration of ascending and descending runs independently follow geometric distributions with for example different parameter values respectively.
From an academic point of view, it might be interesting to see what happens to the efficiency of markets when p and q differ significantly: would they adapt their price variations to compensate for the difference in probabilities of seeing upward or downward uninterrupted trends? Answering this question would be material for another article. On the other hand, obviously these results could easily be incorporated into various trading systems. In addition, another simple application of our methodology, was ranking the different analyzed markets according to the larger fraction of time they follow the geometric behavior for the parametric case. Rank that, on the other hand, seems to coincide with the level of efficiency of the markets studied, issue that, since EMH is given in terms of market prices variations and not in terms of runs, also deserves further study.
Although discussed at last paragraph of section Introduction, we conclude this paper remarking that empirical results as those reported herein are also important and of interest because any adequate agents based market model or of any other kind must reproduce them. See references [4,9].
Supporting information S1 Dataset. File S1_DataSet.zip contains all analyzed data set. (ZIP) S1 Appendix. The discrete version of the Anderson-Darling goodness-of-fit test. (PDF)